API & BUG: allow list-like y argument to df.plot & fix integer arg to x,y #20000

masongallo · 2018-03-05T20:01:24Z

closes Allow list-like for y in DataFrame.plot. #19699
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

I added support for the user to pass a list-like to y, as discussed in #19699. The API to df.plot is relatively complex with lots of args, so lmk with questions / fdbck. Not sure if I covered everything for docs.

TomAugspurger

Looks good thanks. Just a couple edge cases that need to be tested I think (your code should handle them already. Just for future regressions).

TomAugspurger · 2018-03-05T23:01:19Z

pandas/tests/plotting/test_frame.py

+    def test_y_listlike(self, y, lbl):
+        # GH 19699
+        df = DataFrame({"A": [1, 2], 'B': [3, 4], 'C': [5, 6]})
+        _check_plot_works(df.plot, x='A', y=y, label=lbl)


Could you get the ax from df.plot and assert that it has two lines? I think len(ax.lines) should work.

We should also check the color. What happens? Ideally it's the same as df.set_index('x').plot(), so two different colors.

TomAugspurger · 2018-03-05T23:03:40Z

pandas/tests/plotting/test_frame.py

+        # GH 19699
+        df = DataFrame({"A": [1, 2], 'B': [3, 4], 'C': [5, 6]})
+        _check_plot_works(df.plot, x='A', y=y, label=lbl)
+


Could you add tests for

all integer columns: x=0, y=[1, 2]

Mix of int and named columns. x=0, y=['A', 2]

Mix of int and named columns. x=0, y=['A', 2]

Good point. Shouldn't this raise tho? If you try this on a DataFrame you'll get a KeyError. IMO we shouldn't allow users to specify a mix of int & named cols since it's unclear what you actually want.

TomAugspurger · 2018-03-05T23:05:04Z

pandas/plotting/_core.py

+                else:
+                    match = is_list_like(label_kw) and len(label_kw) == len(y)
+                    if label_kw and not match:
+                        raise ValueError(


Can you add a test that raises this assertion?

where is this test?

https://github.com/MasonGallo/pandas/blob/d6d824f3097f2ba48f8778d65422d8471a9f746f/pandas/tests/plotting/test_frame.py#L2176

codecov · 2018-03-05T23:27:07Z

Codecov Report

Merging #20000 into master will increase coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #20000      +/-   ##
==========================================
+ Coverage   91.77%    91.8%   +0.03%     
==========================================
  Files         152      152              
  Lines       49205    49222      +17     
==========================================
+ Hits        45159    45190      +31     
+ Misses       4046     4032      -14

Flag	Coverage Δ
#multiple	`90.19% <100%> (+0.03%)`	⬆️
#single	`41.84% <0%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/plotting/_core.py	`82.5% <100%> (+0.23%)`	⬆️
pandas/core/window.py	`96.26% <0%> (-0.01%)`	⬇️
pandas/io/json/normalize.py	`96.93% <0%> (+0.06%)`	⬆️
pandas/util/testing.py	`84.11% <0%> (+0.16%)`	⬆️
pandas/plotting/_converter.py	`66.81% <0%> (+1.73%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7273ea0...37a6eca. Read the comment docs.

TomAugspurger · 2018-03-06T16:33:29Z

Yeah, raising is probably best.

…

On Tue, Mar 6, 2018 at 7:34 AM, Mason Gallo ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pandas/tests/plotting/test_frame.py <#20000 (comment)>: > df = DataFrame([[1, 3, 5], [2, 4, 6]], columns=list('AAB')) with pytest.raises(ValueError): df.plot(x=x, y=y) + @pytest.mark.parametrize("y,lbl", [ + (['B'], ['b']), + (['B', 'C'], ['b', 'c']) + ]) + def test_y_listlike(self, y, lbl): + # GH 19699 + df = DataFrame({"A": [1, 2], 'B': [3, 4], 'C': [5, 6]}) + _check_plot_works(df.plot, x='A', y=y, label=lbl) + Mix of int and named columns. x=0, y=['A', 2] Good point. Shouldn't this raise tho? If you try this on a DataFrame you'll get a KeyError. IMO we shouldn't allow users to specify a mix of int & named cols since it's unclear what you actually want. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20000 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIi3rkFHROQ43I43ijONt8_Oy7Xgkks5tbqyNgaJpZM4SdaP5> .

masongallo · 2018-03-08T22:05:38Z

I addressed #20056 and added test cases to cover it. Also added a bunch more test cases to increase coverage as requested.

masongallo · 2018-03-08T23:34:21Z

Any ideas on the failure on CircleCI? I'm on a mac so it looks like the test gets skipped locally?

jreback · 2018-03-09T00:32:44Z

doc/source/whatsnew/v0.23.0.txt

@@ -965,6 +965,8 @@ Plotting
 ^^^^^^^^

 - :func:`DataFrame.plot` now raises a ``ValueError`` when the ``x`` or ``y`` argument is improperly formed (:issue:`18671`)
+- :func:`DataFrame.plot` now supports multiple columns to the ``y`` argument (:issue:`19699`)


this first should be on Other Enhancements section

Could you move this to other enhancements?

jreback · 2018-03-09T00:34:03Z

pandas/plotting/_core.py

@@ -1706,21 +1706,37 @@ def _plot(data, x=None, y=None, subplots=False,
        plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
    else:
        if isinstance(data, ABCDataFrame):
+            new_data = data.copy()  # don't modify until necessary


this is not necessary, just use 'data' don't add something else here

I added the copy to fix the bug from #20056 so we get correct integer indexing - what did you have in mind?

I think it's best to convert from integer to labels as soon as possible. Can you check if you can do it at the very start of _plot? Then you don't have to worry about this.

jreback · 2018-03-10T01:55:49Z

pandas/tests/plotting/test_frame.py

+    ])
+    def test_xy_args_integer(self, x, y, colnames):
+        # GH 20056
+        df = DataFrame({"A": [1, 2], 'B': [3, 4]})


we do you think this should be allowed? having both labels and positions is so very confusing.

I agree that it's confusing and definitely don't think it should be allowed. Let's remove support for positions?

TomAugspurger · 2018-03-16T11:51:01Z

Any ideas on the failure on CircleCI? I'm on a mac so it looks like the test gets skipped locally?

Merge master into your branch and repush. The'll be fixed.

TomAugspurger · 2018-03-16T11:51:32Z

doc/source/whatsnew/v0.23.0.txt

@@ -965,6 +965,8 @@ Plotting
 ^^^^^^^^

 - :func:`DataFrame.plot` now raises a ``ValueError`` when the ``x`` or ``y`` argument is improperly formed (:issue:`18671`)
+- :func:`DataFrame.plot` now supports multiple columns to the ``y`` argument (:issue:`19699`)
+- Bug in :func:`DataFrame.plot` with ``x`` or ``y`` arguments as positions (:issue:`20056`)


Could you describe the issue a bit more?

TomAugspurger · 2018-03-16T11:52:59Z

doc/source/whatsnew/v0.23.0.txt

@@ -965,6 +965,8 @@ Plotting
 ^^^^^^^^

 - :func:`DataFrame.plot` now raises a ``ValueError`` when the ``x`` or ``y`` argument is improperly formed (:issue:`18671`)
+- :func:`DataFrame.plot` now supports multiple columns to the ``y`` argument (:issue:`19699`)


Could you move this to other enhancements?

TomAugspurger · 2018-03-16T11:56:07Z

pandas/plotting/_core.py

@@ -1706,21 +1706,37 @@ def _plot(data, x=None, y=None, subplots=False,
        plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
    else:
        if isinstance(data, ABCDataFrame):
+            new_data = data.copy()  # don't modify until necessary


I think it's best to convert from integer to labels as soon as possible. Can you check if you can do it at the very start of _plot? Then you don't have to worry about this.

masongallo · 2018-03-16T14:28:55Z

@TomAugspurger good suggestion! I think it's doable depending on where we land with the API discussion -

jreback · 2018-03-20T00:13:17Z

pandas/plotting/_core.py

+                        "y must be a label or position or list of them"
+                    )
+                label_kw = kwds['label'] if 'label' in kwds else False
+                new_data = data[y].copy()


just call this data

jreback · 2018-03-20T00:13:30Z

pandas/plotting/_core.py

+                label_kw = kwds['label'] if 'label' in kwds else False
+                new_data = data[y].copy()
+
+                if isinstance(data[y], ABCSeries):


you can just test if y is_scalar

Duplicate column names may mess that up, not sure if we allow that here though.

Duplicate column names may mess that up

right, that's why I have the check for ABCSeries

jreback · 2018-03-20T00:13:54Z

pandas/tests/plotting/test_frame.py

+        (0, [1, 2], ['bokeh', 'cython'], ['green', 'yellow'])
+    ])
+    def test_y_listlike(self, x, y, lbl, colors):
+        # GH 19699


can you give a 1-liner expln

jreback · 2018-03-20T00:14:01Z

pandas/tests/plotting/test_frame.py

+        (1, 0, [0, 1])
+    ])
+    def test_xy_args_integer(self, x, y, colnames):
+        # GH 20056


jreback · 2018-03-20T00:14:28Z

pandas/plotting/_core.py

+                else:
+                    match = is_list_like(label_kw) and len(label_kw) == len(y)
+                    if label_kw and not match:
+                        raise ValueError(


where is this test?

jreback · 2018-03-20T00:14:56Z

pandas/plotting/_core.py

+                int_y = is_integer(y) or all(is_integer(c) for c in y)
+                if int_y and not data.columns.holds_integer():
+                    y = data_cols[y]
+                elif not isinstance(data[y], (ABCSeries, ABCDataFrame)):


where is the test for this? (as you expanded to both Series & DataFrame)?

good point, removing

jreback · 2018-03-20T00:15:31Z

pandas/plotting/_core.py

+                int_y = is_integer(y) or all(is_integer(c) for c in y)
+                if int_y and not data.columns.holds_integer():
+                    y = data_cols[y]
+                elif not isinstance(data[y], (ABCSeries, ABCDataFrame)):


why are you actually selecting data[y] here what else could data[y] be?

Yeah, this can probably be removed / replaced?

agreed, will remove

TomAugspurger · 2018-03-20T13:41:33Z

pandas/plotting/_core.py

-                label = kwds['label'] if 'label' in kwds else y
-                series = data[y].copy()  # Don't modify
-                series.name = label
+                int_y = is_integer(y) or all(is_integer(c) for c in y)


This looks fragile. For better or worse (mostly worse) you can pretty much any object in pandas columns, e.g. df = pd.DataFrame({pd.Timestamp('2017'): [1, 2]}).

I think your code will fail for y=pd.Timestamp('2017'), since it's not an integer, but it also isn't iterable.

I'd recommend adding is_list_like(y) before the all

you can pretty much any object in pandas columns, e.g. df = pd.DataFrame({pd.Timestamp('2017'): [1, 2]})

good call

TomAugspurger · 2018-03-20T13:42:30Z

pandas/plotting/_core.py

+                int_y = is_integer(y) or all(is_integer(c) for c in y)
+                if int_y and not data.columns.holds_integer():
+                    y = data_cols[y]
+                elif not isinstance(data[y], (ABCSeries, ABCDataFrame)):


Yeah, this can probably be removed / replaced?

TomAugspurger · 2018-03-20T13:43:08Z

pandas/plotting/_core.py

+                label_kw = kwds['label'] if 'label' in kwds else False
+                new_data = data[y].copy()
+
+                if isinstance(data[y], ABCSeries):


Duplicate column names may mess that up, not sure if we allow that here though.

TomAugspurger

Looks good. May have a closer look tomorrow, but otherwise +1

masongallo · 2018-03-22T14:26:58Z

ping, tests green

TomAugspurger · 2018-03-22T14:31:42Z

Thanks @masongallo!

… x,y (pandas-dev#20000) * Add support for list-like y argument * update whatsnew * add doc change for y * Add test cases and fix position args * don't copy save cols ahead of time and update whatsnew * address fdbck

masongallo added 3 commits March 5, 2018 14:47

Add support for list-like y argument

19d64fe

update whatsnew

8361338

add doc change for y

20a2dc0

TomAugspurger reviewed Mar 5, 2018

View reviewed changes

masongallo mentioned this pull request Mar 8, 2018

BUG: df.plot fails when given x,y args as positions #20056

Closed

gfyoung added Visualization plotting Regression Functionality that used to work in a prior pandas version labels Mar 8, 2018

Add test cases and fix position args

9535b4c

masongallo changed the title ~~API: allow list-like y argument to df.plot~~ API & BUG: allow list-like y argument to df.plot & fix integer arg to x,y Mar 8, 2018

jreback requested changes Mar 9, 2018

View reviewed changes

jreback requested changes Mar 10, 2018

View reviewed changes

masongallo mentioned this pull request Mar 15, 2018

API: Remove integer position args from xy for plotting #20371

Closed

3 tasks

TomAugspurger reviewed Mar 16, 2018

View reviewed changes

masongallo added 2 commits March 19, 2018 11:54

don't copy save cols ahead of time and update whatsnew

7d7d74e

Merge branch 'master' into plot-y-list

d6d824f

jreback requested changes Mar 20, 2018

View reviewed changes

TomAugspurger reviewed Mar 20, 2018

View reviewed changes

address fdbck

37a6eca

TomAugspurger approved these changes Mar 21, 2018

View reviewed changes

TomAugspurger merged commit 02477da into pandas-dev:master Mar 22, 2018

API & BUG: allow list-like y argument to df.plot & fix integer arg to x,y #20000

API & BUG: allow list-like y argument to df.plot & fix integer arg to x,y #20000

Conversation

masongallo commented Mar 5, 2018

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 5, 2018 • edited Loading

Codecov Report

TomAugspurger commented Mar 6, 2018 via email

masongallo commented Mar 8, 2018

masongallo commented Mar 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Mar 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masongallo commented Mar 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

masongallo commented Mar 22, 2018

TomAugspurger commented Mar 22, 2018

codecov bot commented Mar 5, 2018 •

edited

Loading